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HIGHLIGHTS 

• Bias may arise when comparing the value-added scores of teachers who work in different schools. 

• Some schools are more effective than others by virtue of their favorable resources, leadership, or 
organization; we can expect that teachers of similar skill will perform better in these more effective 
schools. 

• Some schools have better contextual conditions than others, providing students with more positive 
influences - peers who benefit from safe neighborhoods and strong community support. These 
conditions may facilitate instruction, thus will tend to increase a teacher's value-added score. 

• Value-added models statistically control for student background and previously demonstrated 
student ability. But these controls tend to be ineffective, and possibly even misleading, when we 
compare teachers whose classrooms vary greatly by these factors. 

• There are several methods for checking the sensitivity of value-added scores to school variation in 
contextual conditions and student backgrounds. 

• If value-added scores are sensitive to these factors, we can revise the analysis to ensure that the 
classrooms being compared are similar on measures of student background and school composition, 
thus reducing the risk of bias. 


INTRODUCTION 

This brief considers the problem of using value-added scores to compare teachers who work in different 
schools. My focus is on whether such comparisons can be regarded as fair, or, in statistical language, 
"unbiased." An unbiased measure does not systematically favor teachers because of the backgrounds of 
the students they are assigned to teach, nor does it favor teachers working in resource-rich classrooms 
or schools. A key caveat: A measure that is unbiased does not mean the measure is accurate. An 
unbiased measure could be imprecise - thus inaccurate - if, for example, it is based on a small sample of 
students or on a test with too few items. I will not consider the issue of statistical precision here, having 
considered it in a previous brief . 1 This brief focuses strictly on the bias that may arise when comparing 
the value-added scores of teachers who work in different schools. 


CHALLENGES THAT ARISE IN COMPARING TEACHERS WHO WORK IN DIFFERENT 
SCHOOLS 

In a previous brief, Goldhaber and Theobold showed that how teachers rank on value-added can depend 
strongly on whether those teachers are compared to colleagues working in the same school or to 
teachers working in different schools . 2 This discrepancy by itself does not mean that between-school 
comparisons are biased, or that comparisons between teachers working in the same school are 
unbiased. However, previous literature identifies three unique challenges that arise in comparing 
teachers who work in different schools, and each brings a risk of bias. 
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First, some schools are more effective than others by virtue of their favorable resources, leadership, or 
organization. We can expect that teachers of similar skill will perform better in these more effective 
schools. Second, some schools have more favorable contextual conditions than others, providing a 
student with more favorable peers - those who benefit from strong community support and 
neighborhood safety. These contextual conditions may facilitate instruction, thus tending to increase a 
teacher's value-added score. Third, value-added models use statistical controls to make allowances for 
students' backgrounds and abilities. These controls tend to be ineffective, and possibly misleading, 
when we compare teachers whose classrooms vary greatly in the prior ability or other characteristics of 
their students. This problem can be particularly acute when we compare teachers in different schools 
serving very different populations of students. It can also arise when we compare teachers who work in 
the same school but who serve very different sub-populations, as with teachers in high schools in which 
students are tracked by ability. 3 

After describing each of these challenges, I consider ways to check the sensitivity of value-added scores 
to variations in schools' contextual conditions and students' background. If value-added scores are 
sensitive to these factors, we can revise the analysis to ensure that the classrooms being compared are 
similar on measures of student background and school composition, thus reducing the risk of bias. In 
this revised analysis, the aim is to compare teachers who work with similar students in similar schools. 
While policymakers may debate the utility of such comparisons, such comparisons are better supported 
by the available data than are comparisons between teachers who work with very different subsets of 
students and under different conditions. Thus they are less vulnerable to bias. 

1. Variation in school effectiveness. 

Emerging evidence suggests that some schools are more effective than others in managing resources, 
creating cultures for learning, and providing instructional support. An effectively organized school may 
provide benefits to all teachers working in that school. If so, teachers assigned to such schools will look 
better on value-added measures than will teachers who work in less effective schools. 

Social scientists have for years debated whether schools vary substantially in their effectiveness, and if 
so, why. In his landmark 1966 report "Equality of Educational Opportunity," sociologist James S. 

Coleman suggested that socio-economic segregation of schools contributed to variation in learning but 
that factors such as facilities and spending mattered little. 4 The Coleman Report cast a long shadow over 
the proposition that giving schools more resources would improve education. However, Coleman's and 
other early studies were based on cross-sectional data. Early value-added modeling of schools based on 
longitudinal data suggested that students with similar backgrounds experience very different growth 
rates depending on the schools they attend. 5 These and more recent longitudinal studies raised the 
question of whether the internal life of schools, and in particular differences in leadership and collegial 
support, are more important than student composition in promoting learning. 6 

Recent randomized experiments provide compelling evidence that schools have highly varied effects. 
These studies capitalize on the fact that new charter schools are often oversubscribed: more students 
apply than can be admitted. By law, applicants to these charter schools are offered admission on the 
basis of a randomized lottery. Researchers are now following the outcomes of winners and losers of 
these lotteries. A study of randomized lotteries in 36 charter schools found that being admitted to a 
charter school made little difference in outcomes, on average. However, the variation in the impact of 
being so assigned was substantial. This result gives strong causal evidence not that charter schools per 
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se are particularly effective, but that some schools are substantially more effective than others. 7 Other 
researchers developed a model to predict this variation and found that five policies, including "frequent 
feedback to teachers, the use of data to guide instruction, high-dosage tutoring, increased instructional 
time, and high expectations," explain approximately 50 percent of the variation in school effectiveness. 8 
Leaders in effective charter schools take pains to ensure that teachers follow school-wide procedures 
and norms. These randomized studies corroborate work showing that effective school leadership, 
professional work communities, and even school safety during a base year predict changes in the value 
that a school adds to learning. 9 

Separating the contribution of school leadership and resources from the average contribution of teacher 
skill is challenging, however. A critic of the review above might reasonably argue that what makes a 
school effective is nothing more than the average skill level of its teachers. However, several recent 
studies provide evidence against this criticism. In two of these studies, experimenters randomly assigned 
whole schools to innovative school-wide instructional curricula, revealing substantial positive effects on 
student learning. In these cases, the teaching force remained stable, yet the introduction of a new 
school-wide curriculum created added value that cannot be attributed simply to the aggregate quality of 
the teaching force. 10 Another recent study followed the value-added scores of teachers as they moved 
from one school to another. This study provided evidence that a teacher's value-added score will tend to 
improve when that teacher moves to a school in which other teachers have high value-added scores. 

This evidence suggests that teachers learn from high-skill peers. 11 Teacher collaboration and peer 
learning may also augment the impact of school-level factors such as the coherence of the curriculum, 
the availability of instructional materials, and the length of the school day or year. 12 In sum, comparisons 
between the value-added scores of teachers who work in different schools confound teacher skill and 
school effectiveness. Current value-added technology provides no means by which to separate these 
influences, a fact that defines a challenge to future research. Thus we have good reason to suspect that 
school effectiveness biases comparisons of the value-added scores of teachers working in different 
schools. 

2. Variation in peers. 

Within a district, schools tend to serve quite different sets of students. High-achieving students tend to 
be clustered in schools in which peers are highly motivated, parents are committed to the success of the 
school, and the surrounding neighborhood is safe. 13 These are presumably favorable conditions for 
instruction. Relatedly, sociological research has established that teachers tend to calibrate the content 
and pacing of their instruction according to the average prior achievement of their students, 14 implying 
that these well-prepared students will learn faster in classes with high-ability peers. Moreover, there is 
evidence that teachers themselves believe they are more effective when teaching higher-ability 
students than when teaching lower-ability students. 15 We have good reason, then, to think that 
favorable peer composition facilitates school and teaching effectiveness. The problem at hand is that, in 
principle, standard value-added models cannot isolate the impact of school organization or teacher 
expertise if those factors are correlated with peer motivation, parent commitment, neighborhood safety, 
and other local conditions. 16 The reason is that although value-added models may include measures of 
peer composition such as average prior ability or average family socioeconomic status, the value-added 
of the teacher or school is unobserved. If the peer composition and value-added are correlated - as 
suggested by the research - we have no way of isolating value-added. 17 Indeed, an attempt to control 
for peer composition when estimating value-added may introduce extra bias into value-added scores. 18 
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The connection between peer composition and instructional effectiveness likely plays out very 
differently at the elementary and secondary levels. Elementary schools draw students from local 
neighborhoods that tend to be quite segregated with respect to family income and race/ethnicity. As a 
result, elementary schools tend to be comparatively internally homogeneous with respect to student 
background. In contrast, large, comprehensive public secondary schools draw students from multiple 
elementary schools and thus tend to be internally more heterogeneous than are elementary schools. In 
response, high schools typically assign students to classrooms based on their perceived ability. This 
process of "tracking" can generate large differences among classrooms . 19 Harris considers the special 
problems that arise in studying teacher value-added within secondary schools that use tracking , 20 and 
reasons that problems of peer composition are more pronounced within high schools than they are 
within elementary schools. These effects are likely to be particularly important when we compare 
teachers who work in different schools, even elementary schools. 

3. "Common support" and statistical adjustment for student background. 

The problem of statistically adjusting for student background is distinct from that of isolating peer 
effects or differences in school effectiveness. Even if peer effects were negligible and all schools were 
equally effective, classroom composition could bias value-added. For example, two classrooms taught by 
equally skilled teachers might display different learning rates simply because one classroom had more 
able students. Random assignment of students to teachers would solve this problem, but random 
assignment doesn't happen in practice, so statisticians have invented adjustments to control for student 
background factors that predict future achievement. The problem is that, in general, we cannot rely on 
standard methods of statistical adjustment to work well when the backgrounds of children attending 
different classrooms vary substantially. Statisticians call this the failure of "common support," and it is 
more likely to occur when we compare teachers in different elementary schools than when we compare 
teachers in the same elementary school. This problem is also likely to arise in comparisons of teachers 
within high schools that use tracking. 

To understand how statistical adjustments work, consider the comparisons of two teachers, A and B, 
who teach students having different prior average achievement. Statistical adjustment is based on a 
statistical model that predicts how Teacher A's students would have done if they had been assigned to 
Teacher B and how Teacher B's students would have done if they had been assigned to Teacher A. If the 
statistical model is based on good background information, such as prior test scores that strongly predict 
future test scores, this may work very well. In particular, if the two groups of students overlap 
considerably in their background, the data will have good predictive information about how each set of 
students would do in either classroom. This is a case in which the two groups have good "common 
support" for the model. 

However, if these two distributions do not overlap, we have a problem - a failure of common support. 
Suppose, in the worst case, that all of Teacher A's students have higher prior achievement than any of 
Teacher B's students. In this case the data have no information about how A's students would do in B's 
class or how well B's students would do in A's class; there is simply no valid comparison group for either 
teacher. In this case, the value-added score will simply be an extrapolation - a guess based on the 
analyst's belief about whether the relationship between prior background and future test score is linear 
or, in some known way, non-linear . 21 In essence, the comparison of value-added scores is not based on 
the data but is entirely based on the analyst's assumptions about the model. This extreme case - no 
overlap in the two distributions - is unlikely to arise in practice. The key point is that the smaller the 
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overlap in the two distributions of prior background, the less information the data can provide about the 
comparative effectiveness of the two teachers. This problem is especially acute if there is any reason to 
suspect that teachers are differentially effective for students of different backgrounds. In that case, it is 
essential to compare teachers serving similar children to draw any valid causal conclusions. 22 Central to 
our discussion is the fact that, at the elementary school level, a lack of common support is more likely to 
occur in comparisons between teachers working in different schools than in comparisons between 
teachers working in the same school. A failure of common support is also likely to arise in comparisons 
among high school teachers working in the same school if that school tracks students on the basis of 
ability. It is possible and useful to check comparability in any value-added analysis, a topic to which I 
return in the concluding section. 

Exacerbating any failure of common support is the inherent uncertainty in achievement test scores. 
Suppose one school draw students from the upper end of the achievement distribution while another 
school draws students from the lower end. Suppose that students in these schools make, on average, a 
five-point gain in achievement. To say that these gains represent an equivalent amount of learning 
requires unwarranted assumptions about the achievement test. We can speak confidently of our 
measurements of things like time and distance, but measuring gains in cognitive skill is much more 
difficult. On some tests, low-achieving students can easily make comparatively large score gains simply 
because the test has a large number of easy items. In this case, teachers working in low-scoring schools 
would produce inflated value-added scores. On other tests, it will be comparatively easy for high- 
achieving students to make large gains, biasing value-added in favor of teachers working in those 
schools. Simply put, our current technology for constructing tests does not allow us to make strong 
claims about the relative gains of students who start from very different places in the achievement 
distribution. Our testing technology works much better when are comparing gains made by students of 
similar background and prior skill. 

In sum, efforts to compare teachers working in schools that serve children from widely varied 
backgrounds are vulnerable to bias. In a typical school district about 15-20 percent of the total variation 
in students' average incoming achievement lies between schools. 23 This means that students attending a 
high-achieving school will tend to score around 1.5 standard deviations higher, on average, than 
students attending a low-achieving school. To compare teachers working in such a wide range of schools 
may include a risk of failure of common support, as well as introduce substantial peer effects, increasing 
the risk of bias when we compare the value-added scores of teachers working in different schools. 


EMPIRICAL EVIDENCE OF BIAS 

As we have seen, researchers have found that teachers rank quite differently when they are compared 
to colleagues in the same school than when they are compared to teachers in other schools. This finding 
leads us to predict that comparing teachers in different schools produces more bias than does 
comparing teachers in the same school. However, we need to see whether empirical evidence supports 
such predictions. What do we know about the magnitude of bias that arises in each case? 

Comparing teachers within schools 

Several randomized experiments lend support to the idea that value-added scores are approximately 
unbiased. In one large-scale study, students were randomly assigned to teachers. No statistical 
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adjustments were needed to correct for bias, so a comparison between classrooms within the same 
school can be regarded as a comparison of "true value-added." The analysts found that the variation in 
such value-added scores was quite large, indicating that teachers do, in fact, vary substantially in their 
effectiveness. Moreover, the variation in value-added in this experiment was similar in magnitude to the 
variation in value-added typically found in conventional non-randomized value-added analysis. 24 Two 
more recent experiments assessed the bias of value-added scores more directly. Value-added scores 
were computed conventionally for one year using standard methods of statistical adjustment. During 
the next year, pairs of teachers within schools were randomly assigned to student rosters, enabling the 
researchers to compute unbiased value-added scores with no statistical adjustment. Comparisons 
between the conventional and experimental value-added scores provided some evidence that the 
conventional scores were approximately unbiased. 25 

All of these encouraging studies used experimental evidence to investigate the bias of using 
conventional value-added scores to compare teachers working in the same school. A key question is 
whether such encouraging results can be found for comparing teachers in different schools. Here the 
research base is sparser. Perhaps the most important study of this type followed 2.5 million children in 
grades 3-8 into adulthood. The researchers found that students assigned to high value-added teachers 
had higher educational attainment, earnings, and wealth as adults. 26 The researchers tested the 
potential bias of the value-added scores in two ways. First, using parental tax data, they compared 
students who had experienced high value-added teachers to those who had experienced low value- 
added teachers. They found no association between teacher value-added and these socioeconomic 
measures, providing evidence against the claim that unmeasured family characteristics had biased the 
value-added scores. Secondly, they compared a school's average achievement before and after a "high- 
value-added" teacher had left the school. They found that when a school loses a teacher with a high 
value-added score, the school's achievement tends to decrease. This finding is important, because it 
supports the claim that the value-added score has causal content, and it supports the finding that 
within-school differences in teacher value-added reflect real differences in effectiveness. Nevertheless, 
the teacher value-added scores computed in this study, despite reflecting differences in teacher 
effectiveness, are vulnerable to bias. At least part of the variation in teacher value-added may have 
reflected differences in school organizational effectiveness or differences in community and peer effects. 
The researchers did not consider the possibility that high-value-added teachers work in schools that are 
effectively managed, and that attending such an effective school is key to students' future success. Nor 
did they test whether omitted school-level variables were associated with teacher value-added. Instead, 
the authors assumed implicitly that any differences between schools must reflect differences in the 
individual skill of teachers working in those schools. A re-analysis of these data could estimate the 
contribution of school value-added to adult success to assess whether an alternative explanation based 
on school differences is plausible. 


CONCLUSIONS AND RECOMMENDATIONS 

This brief has considered sources of potential bias when we use value-added scores to compare teachers 
working in different schools. A growing body of evidence suggests that schools can vary substantially in 
their effectiveness, potentially inflating the value-added scores of teachers assigned to effective schools. 
Schools also vary in contextual conditions such as parental expectations, neighborhood safety, and peer 
influences that may directly support learning or that may contribute to school and teacher effectiveness. 
Moreover, schools vary substantially in the backgrounds of the students they serve, and conventional 
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statistical methods tend to break down when we compare teachers serving very different subsets of 
students. 

Although we know that there is great potential for bias when we compute value-added scores for 
teachers working in different schools, we do not yet know the extent to which this potential is actually 
realized. Nor do we have conclusive evidence regarding the extent to which teachers are particularly 
effective or ineffective for particular kinds of students. This uncertainty poses a challenge to those who 
wish to interpret teacher value-added scores. One way to address this challenge is to check the 
sensitivity of value-added results to variations across schools in effectiveness and student composition. 
Several approaches come to mind. 

First, following Goldhaber and Theobold, 27 one can compute value-added scores two ways: by 
comparing teachers within schools and by comparing teachers without regard to their school 
assignment. If the rankings are consistent, we have little reason to favor the within-school comparisons. 
My colleagues and I did this, not with value-added scores but with student perceptions of teaching 
quality using seven indicators of teacher effectiveness based on the Tripod Survey Assessments of 
Ronald Ferguson from Harvard University. 28 Our results, taken from data on a large urban district, were 
highly convergent. Correlations between indicators computed in these two different ways ranged 
from .91 to .96 across the seven dimensions, with a mean of .94. This was not surprising, because school 
differences accounted for little of the variation in Tripod: Only 2-7 percent of the variation in these 
indicators lay between schools. Given these convergent results, there is little reason to believe that 
school differences were adding extra bias in these indicators based on student perceptions. (As a further 
check, I recommend computing the percentile rank of teachers under the two procedures; that is, 
comparing teachers who work in the same school and comparing teachers without reference to the 
school in which they work). 29 These findings are not surprising: Students are likely to base their 
perceptions of teaching quality on experiences with teachers in the same or similar schools, which likely 
explains why the fraction of variation between schools in student perceptions is comparatively small. In 
contrast, as mentioned, value-added scores tend to vary considerably between schools. 

What if results are not convergent? A sensible strategy is to divide schools into subsets that serve rather 
similar students. One might then use value-added scores (or other indicators) to compare teachers who 
work in the same subset of schools. To check the sensitivity of those measures, one might again check 
convergence between two sets of estimates: those that compare teachers within schools and those that 
compare teachers working in different schools but in the same subset of schools. If these are convergent, 
we can assume that the decision to compare teachers working in different schools (within the same 
subset of schools) has not contributed bias to the value-added scores. 30 

Shavelson and Wiley suggest a refined version of this approach: For each school, select a subset of 
schools that match that school in terms of student characteristics. Call this the reference set for a 
particular school. Ideally, each school is located in the middle of its reference set with respect to the 
distribution of expected achievement gains. 31 

An additional check is to compute the "contextual effect" of student composition, as is common in 
research in educational sociology. This indicator, along with a measure of variation between schools in 
school-mean prior achievement, is diagnostic of the bias that plausibly arises from school 
heterogeneity. 32 The same procedure can be used to assess whether classrooms within a school are too 
heterogeneous in student background to support unbiased value-added. I would recommend using this 
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procedure (see appendix), particularly in the case of secondary schools that track students to classrooms 
on the basis of ability. 

These sensitivity checks, and the possible stratification of teachers into sub-groups serving similar 
students, complicate value-added analysis and may not be congruent with policymakers' wish to 
compare all teachers in a district. However, these steps may win the approval of teachers who want to 
be sure that comparisons between themselves and other teachers are free of bias. Moreover, I 
recommend this modified approach as scientifically responsible, for it limits us to answering questions 
that our data can actually answer. 

APPENDIX 

Raudenbush and Willms (1995) wrote down a fairly general linear model for value-added, then used a 
very simple model for illustration. The illustration is informative for this brief. We have an outcome Y - tj , 

a linear function of a covariate X - defined for each student / within classroom j (/ = 1,..., ; j = 1 ,.../) . 

We can orthogonally decompose the regression within and between classrooms as 


where X.j is the classroom mean of the covariate, X.. is the overall ("grand") mean; /^characterizes 

the association between X and Y within classrooms; and (3 h characterizes the association between and 

the classroom mean of the covariate and the classroom mean outcome. By re-centering the covariate, 
we obtain a re-parameterization of (Al) known in the sociology literature as the "contextual effects 
model." 


Here f3 c = J3 b - jB w is the "contextual effect" (the association between X.^and the mean outcome 
controlling for person-level X tj ); Uj and e tj are random effects at the classroom and student levels 
respectively; and we assume 

Ignorable student-level error E(e tj I X) = E(e) = 0 Assumption (i) 

We allow for confounding between, but do not, however, assume E(U . I X) = E(U j) = 0 . Let us now 
average the outcome within classrooms to obtain 



(Al) 



(A2) 



(A3) 


Type A and Type B Effects 
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Now we can define two effects. The first is the Type A effect, of interest to parents choosing classrooms, 
that is 


£(X., -*..) + £/, 

This is composed of two components. The first is the contextual component J3 c (X.j - X..) , 

attributable to the composition of the classroom with respect to the covariate X.j . This might 

represent the impact of peer composition, the quality of the resources available to the teacher, or the 
supportiveness of the parents. The second is the random effect U jf attributable to the effectiveness of 

teacher / s practice. 

The Type B effect, of interest to teacher evaluators, is thus the random effect U . itself. That is, 

b j sU j 

We shall assume that this key estimand - the true value added for teacher j, has zero mean and variance 


Var(Uj) = t 2 (A4) 

Identification 

Under Assumption (i) in connection with A2, we can identify J3 W , and hence the Type A effect by 
subtraction 


A. * Y.j -J3 0 - fi w (X.. - X..) = P c {X.. - X..) + Uj + e.j 
= A j + «v 

Equation (A5) is what economists would call the estimated "fixed effect." 


(A5) 


Under an additional strong identification assumption Cov(U , , X.j ) = 0 , we can identify the Type B 
effect by subtraction as 




However, if Cov(U j9 X.j) ^ 0, we will estimate with bias, that is, the least squares estimator jB h 
will have expectation 

£(A) = A + &, <A7) 

where J3-=Cov(U.,X..)/<jI and is the variance of the sample mean of the covariate X.. (Note 
this variance will be non-zero even if n j is infinite, assuming classrooms vary in composition). As a result. 
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if Cov(JJ j,X . ■) ^ 0, (6) does not approximate the true type B effect but instead estimates a quantity 
we might called the "prima facie" effect, that is 

B PFj =U j -j3 wc (X. j -X..) (A8) 


A Way Forward 

We can put a bound on how biased our analysis will be using the following result Raudenbush and 
Willms (1995): 

Var(B PFj ) < Var(Bj) < Var(A } ) 

or (A9) 

(1-/4 )r 1 2 < r 2 < (£ +(5jo\ + (l-/4)r 2 

We can consistently estimate Var(B PFj ) and, if Assumption (i) is plausible, we can consistently estimate 
Var(Aj) . We can therefore put bounds around the variance of the true value-added effects. Moreover, 
it follows from (A9) that the variance of the bias is 

o</4r 2 = c#+/U 2 4 (aio) 

If (A5) is reasonable, we can estimate the upper bound as 

(A+/U 2 4 *A 2 4 (aid 

If the estimated contextual effect is null or if the classroom mean X.j varies little, we need not worry. If 
their product is large, our biases are large in magnitude. We might then back away from the universal 
"Type B" effect and instead block on classrooms with similar X mj . This would narrow the scope of the 
reference population for teacher j to include other teachers whose classrooms have similar composition. 

Note schools are not represented in our model. Hence the Type B effect is in fact the sum of the teacher 
value-added plus the effectiveness of the school. 
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